% Chapter 6

\chapter{Conclusion}
\label{Chapter6}
\lhead{Chapter 6. \emph{Conclusion}}

This project has achieved its objective of producing a fully functional prototype of an online local newspaper generator. The final product provides a service currently unavailable anywhere else on the Internet. 

Below is a review of the principal achievements in comparison to the initial aims set out at the beginning of this thesis.

\begin{itemize}
	\item A novel approach to extracting the content of an online news article has been proposed and implemented, achieving a 97.5\% success rate. 
	
	This is superior to most other proposed methods \cite{vs,treeedit} and crucially requires no training or previous knowledge of the source. The only other published method  \cite{percep} known to have better performance, requires patented technology that would limit the commercial possibilities of any product based on it. 
	
	\item A topic classification method based on Support Vector Machines has been implemented to automatically organise articles into the four major categories that would be found in any print newspaper and therefore expected in any online news aggregator.
	
	This was achieved with 93\% accuracy when tested within the current scope of the training data. The method has been designed and developed in order to allow full flexibility in the choice and number of topics included. This allows for topics to be added or removed without significantly affecting classification accuracy.
	
	\item A novel approach to location classification, also based on Support Vector Machines, has been proposed and implemented. Using training data, instead of pre-defined geographical taxonomies, it is able to automatically detect the key words best able to associate an article with a particular location.
	
	Although limited by the training data provided, as with the topic classifier, it has been designed to allow for new classes to be easily added. This means that the current scope of the classifier can be automatically expanded by the inclusion of more training sets.
	
	\item A ranking method has been developed to measure the relevance to the desired location and order the list of articles returned.
	
	The combination of techniques used in this method allow for a rank to be provided even when dealing with queries for which the location classifier has not been trained. The inclusion of more classes to the location classifier would greatly increase the number of features detected, allowing for more accurate ranking to be performed.
	
	\item An efficient data structure has been constructed in order to store information on thousands of UK-based town and cities. This information is used to quickly and accurately obtain the nearest neighbouring locations in order to expand the scope of the initial search query.
	
	This unique feature allows for a full and relevant newspaper to be generated for any location, regardless of how little news is available for that particular, narrow location. Although currently only featuring UK-based locations, were the product expanded to cover multiple countries, its text-based input system would be able to automatically incorporate additional geographical information.
\end{itemize}

\section{Future Work}

\begin{itemize}
	
	\item \textbf{Scope Expansion:} The acquisition of training sets for a greater number of locations would increase the scope of the product and provide more accurate location classification. Although a tedious task, obtaining this training data is a relatively simple process.
	
	\item \textbf{Alternative Document Representation:} Gabrilovich et al \cite{world}, as well as Wand and Domeniconi \cite{wiki}, have proposed alternative approaches to representing documents as a bag of words (see Section \ref{3bog}). 
	
	These representations use collective knowledge, acquired from structured online databases such as Wikipedia\footnote{ Wikipedia, URL:\url{http://en.wikipedia.org/} [Last Accessed: 25/04/09]}, to embed background knowledge into document representation. This means significantly less training data is required to accurately represent a class which would ease the process of acquiring new training sets.
	
	\item \textbf{Pre-processing articles:} As discussed in Section \ref{colart}, if this prototype were to be expanded into a fully operational online news service, the collection process would have to be altered in order to pre-process articles, rather than acquire them on a per-query basis. This would involve classifying articles at the time of discovery, then storing them within a database for later retrieval. 
	
	\item \textbf{Inclusion of additional services:} A simple expansion of the product would be to include additional services that would be useful when inquiring about a local area. 
	
	An example would be to provide the weather forecast for the desired location. This has already been investigated and deemed possible using Yahoo Weather RSS feeds.\footnote{ Yahoo Weather RSS feeds, URL: \url{http://developer.yahoo.com/weather/} [Last Accessed: 24/05/09]}
	
	\item \textbf{Producing a newspaper in traditional print format:} With the current prototype producing the output as a web page, a possible extension to the rendering stage would be to display articles in a traditional newspaper format. 
	
	This however is not a trivial task as it requires the analysis of layout aesthetics, a process which is difficult to automate \cite{buhr}. Several methods have been proposed (\cite{layout},\cite{newsdoc}) but have not proved successful. Attempting this process would therefore involve further research and development.
\end{itemize}
